Usefulness in a data science workflow

Exploratory work:

Expository work:

Share-able, portable, composable (i.e., reports, dashboards, etc)

Grammar of graphics

  • Leland Wilkinson (1980): A grammar of graphics is a framework which follows a layered approach to describe and construct visualizations or graphics in a structured manner.

Many libraries abide by this grammar: ggplot2, plotly, D3 and many others.

Grammar of graphics and ggplot2

The gg in ggplot2 stands for Grammar of Graphics.

“The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.”

— Hadley Wickham

Advantages

  • Functional data visualization
    • Clean and wrangle data
    • Map data to visual elements
    • Tweak scales, guides, axis, labels and theme
  • Ease of iteration
  • Ease in maintaining consistency

Prepare your system

  1. Update R and RStudio to the latest, released version: https://www.rstudio.com/products/rstudio/download/

  2. Download and load the following packages:

install.packages(c("devtools", "knitr", 
                   "usethis", "tidyverse", 
                   "plotly", "r2d3", 
                   "crosstalk", "shiny", 
                   "flexdashboard"))

Data source: espnscraperR, which collects or scrapes QBR, NFL standings, and stats from ESPN

remotes::install_github("jthomasmock/espnscrapeR")
library(tidyverse)
library(espnscrapeR)

nfl_qbr <- 
  crossing(season = 2020, week = 1:6) %>% 
  pmap_dfr(espnscrapeR::get_nfl_qbr)

Consider working in the preview version of RStudio as well for latest features (that are more stable than the daily build): https://www.rstudio.com/products/rstudio/download/preview/

Plotly

  • Plotly R package allows you to create a variety of interactive graphics

  • Two ways:

    • transforming a ggplot2 object into a plotly object via ggplotly()
    • directly initializing a plotly object with plot_ly(), plot_geo() or plot_mapbox()
  • There are strengths, weaknesses to either approach

  • Learning both will pay dividends, as they share a common grammar and can be reused

Intro to ggplotly()

library(plotly)

nfl_qbr_plot <- nfl_qbr %>% 
  ggplot(aes(x = total_epa, y = qbr_total, label = short_name)) +
  geom_point() +
  geom_smooth(method = "lm") +
  geom_label() +
  theme_minimal() +
  labs(
    x = "EPA", y = "QBR", 
    title = "EPA is correlated with QBR"
    )

nfl_qbr_plot

ggplotly(nfl_qbr_plot)

Intro to plot_ly()

  • Any plot made with plot_ly() uses the JavaScript library plotly.js

  • plot_ly() interfaced directly with plotly.js

  • plot_ly has arguments that fit into the “Grammar of Graphics”:

    • x, y
    • color, stroke, span, symbol, linetype
  • There is a family of add_* functions:

    • add_lines()
    • add_bars
    • add_histogram2d()
    • add_contour
    • add_boxplot
    • …and many more!

The plotly package takes a pure functional approach to a layered grammar of graphics, meaning (almost) every function anticipates a plotly object as input to it’s first argument and returns a modified version of that plotly object.

Example: the layout() function anticipates a plotly object in it’s first argument and its other arguments add and/or modify various layout components of that object (e.g., the title):

layout(
  plot_ly(nfl_qbr %>% filter(game_week == 3), x = ~short_name, y = ~total_epa, color = ~rank),
  title = "QB Total EPA for Week 3"
)

Notice that data manipulation verbs from the dplyr package (such as filter()) can be used to transform the data underlying a plotly object!

Or, in a more cleaner fashion:

nfl_qbr %>% 
  filter(game_week == 3) %>% 
  plot_ly(x = ~short_name, y = ~total_epa, color = ~rank) %>% 
  add_bars() 

You can also add a layer of text using the summarized counts. Note that the global x mapping, as well as the other mappings local to this text layer (text and y), reflect data values from step 3:

nfl_qbr %>% 
  filter(game_week == 3) %>% 
  plot_ly(x = ~short_name, y = ~total_epa, color = ~rank) %>% 
  add_bars() %>% 
  add_text(
    text = ~team,
    textposition = "top middle",
    cliponaxis = F
  )

To recap:

  1. Globally assign short_name,total_epa, rank to x, y and color, respectively
  2. Add a bars layer (which inherits the y from plot_ly)
  3. Use dplyr verbs to modify the data underlying the plotly object
  4. Add additional layers (like a layer of text)
  5. Share

D3 and Leaflet

D3 and Leaflet are also two popular JavaScript libraries whose R interfaces is based on the grammar of graphics. Like plotly, D3 and Leaflet can be used in a data pipeline with other R tools to clean, wrangle, transform and model with consistency and ease of iteration.

D3

The amazing dreamRs team is behind the {r2d3maps} package which allows you to make D3 maps in a flash. Check out their documentation to get a taste of what you can make: https://github.com/dreamRs/r2d3maps.

# Packages ----------------------------------------------------------------
library(janitor)
library(albersusa)
library(sf)
library(tigris)
library(r2d3maps)
library(rmapshaper)

# County geometries -------------------------------------------------------

cty_sf <- counties_sf("longlat") # from albersusa package 
plot(st_geometry(cty_sf)) # checking geometries, looks good

cty_sf <- ms_simplify(cty_sf) # make the geometries simpler for faster rendering 


# Data ---------------------------------------------------------------------

# CDC's social vulnerability index variables: https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf

svi <- read_csv("/Users/atoumi/Desktop/interactive_graphics/svi_county.csv") %>% 
  clean_names() %>% # make everything snake_case
  filter(across(where(is.numeric), ~. >= 0)) # because NA's are coded as -Inf

# join data to geometries
cty_sf_joined <- cty_sf %>% geo_join(svi, by_sp = "fips", by_df = "fips")

# make d3 map
d3_map(shape = cty_sf_joined, projection = "Albers") %>%
  add_labs(caption = "Viz: Asmae Toumi | Data: CDC") %>% 
  add_continuous_breaks(var = "rpl_theme1", palette = "Reds") %>%
  add_legend(title = "Socioeconomic Vulnerability (percentile)") %>%
  add_tooltip("<b>{location}</b>: {rpl_theme1}")

Leaflet

Like plotly, leaflet has an R interface. Unlike D3, Leaflet is more suited for maps. Leaflet is well-suited for the web, highly customizable and integrates well with R’s ecosystem.

library(tigris)

cty <- counties(cb = TRUE, resolution = "20m") %>% rename(fips = GEOID)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  34%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
library(tidycensus)

us_county_income <- 
  get_acs(geography = "county", variables = "B19013_001") %>% 
  select(fips = GEOID, name = NAME, income = "estimate") %>% 
  drop_na()

cty <- cty %>% 
  geo_join(us_county_income, by_df = "fips", by_sp = "fips")

library(leaflet)
library(RColorBrewer)
library(htmltools)

pal <- colorNumeric("YlOrRd", domain = cty$income)

labels <- 
  sprintf(
    "<strong>%s</strong><br/> Income: %s $",
    cty$name, cty$income) %>% 
  lapply(htmltools::HTML)

leaflet(cty) %>%
  setView(-96, 37.8, 4) %>%
  addProviderTiles("CartoDB.PositronNoLabels") %>%
  addPolygons(
    fillColor = ~pal(income), 
    weight = 1,
    opacity = 1,
    color = "white",
    fillOpacity = 0.7,
    highlight = highlightOptions(
      weight = 2,
      color = "#666",
      fillOpacity = 0.7,
      bringToFront = TRUE),
    label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal"),
      textsize = "15px",
      direction = "auto")) %>%
  addLegend(pal = pal,
            values = cty$income,
            position = "bottomright",
            title = "Income (5-year ACS)",
            labFormat = labelFormat(suffix = "$"),
            opacity = 0.8,
            na.label = "No data")

See leaflet.R for standalone code.

Sharing

Saving plotly/ggplotly objects

  • Plotly/ggplotly objects are also htmlwidgets so any methods used for saving htmlwidgets also works for plotly/ggplotly objects:
    • Saving and embedding HTML: htmlwidgets::saveWidget() function (also works for leaflet, DT, etc.)
    • Exporting static images: orca() as .svg or .pdf

Combining multiple views

Arranging multiple graphs

p1 <- plot_ly(nfl_qbr, x = ~total_epa, y = ~qbr_total) %>% 
  add_lines(name = "qbr total")

p2 <- plot_ly(nfl_qbr, x = ~total_epa, y = ~qb_plays) %>% 
  add_lines(name = "qb plays")

subplot(p1, p2, shareX = T)

Animation to combine multiple plots

gg <- nfl_qbr %>%
  filter(team %in% c("ARI", "ATL", "CHI", "CAR", "DAL")) %>% 
  ggplot(aes(x = total_epa, y = qbr_total, color = team)) +
  geom_point(aes(frame = game_week)) 

ggplotly(gg)

Arranging multiple views

Since plotly objects are also htmlwidgets, any method that works for arranging htmlwidgets also works for plotly objects.

Common ways to arrange components in a single web-page:

  1. flexdashboard: An R package for arranging components into an opinionated dashboard layout. This package is essentially a special rmarkdown template that uses a simple markup syntax to define the layout.

  2. Bootstrap’s grid layout: Both the crosstalk and shiny packages provide ways to arrange numerous components via Bootstrap’s (a popular HTML/CSS framework) grid layout system.

  3. CSS flexbox: If you know some HTML and CSS, you can leverage CSS flexbox to arrange components via the htmltools package.

We’ll see how crosstalk, flexdashboard and DT work together with Demo.Rmd script. We’ll also see a Shiny app with the app.R script.

For more on shiny, check out this talk by Carson Sievert: https://talks.cpsievert.me/20191115/#1

Lessons learned

  • The amount of interactive techniques is overwhelming
  • Focus should be on identifying an analysis task/question first
  • There will be different ways of doing the same thing, pick the most sensible to you

Ideally, according to Carson Sievert:

Go for R packages/technologies for creating interactive web graphics which:

  • Don’t require knowledge of web technologies (start-up cost)
  • Produce standalone HTML whenever possible (hosting/maintenance cost)
  • Work well with other “tidy” tools in R (iteration cost)
  • Link to external vis libraries (startover cost)
  • Easy to use interactive techniques that support data analysis tasks (discovery cost)

References